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Abstract. A geometric model ol sparse signal representations is intro- 
duced for classes of signals. It is computed by optimizing co-occurrence 
groups with a maximum likelihood estimate calculated with a BernouUi 
mixture model. Applications to face image compression and MNIST digit 
classification illustrate the applicability of this model. 

1 Introduction 

Finding image representations with a dimensionality reduction while maintain- 
ing relevant information for classification, remains a major issue. Effective ap- 
proaches have recently been developed based on locally orderless representations 
as proposed by Koendering and Van Doom [T. They observed that high fre- 
quency structures are important for recognition but do not need to be precisely 
located. This idea has inspired a family of descriptors such as SIFT [2] or HOG 
[5], which delocalize the image information over large neighborhoods, by only 
recording histogram information. These histograms are usually computed over 
wavelet like coefficients, providing a multiscale image representation with several 
wavelets having different orientation tunings. 

This paper introduces a new geometric image representation obtained by 
grouping coefficients that have co-occurrence properties across an image class. 
It provides a locally orderless representation where sparse descriptors are delo- 
calized over groups which optimize the coefficient co-occurrences, and can be 
interpreted as a form of parcellization [4] . Section [2] reviews wavelet image rep- 
resentations and the notion of sparse geometry through significant sets. Section 
[3] introduces our co-occurrence grouping model which is optimized with a max- 
imum likelihood approach. Groups are computed from a training sequence in 
Section 131 using a Bernoulli mixture approximation. Applications to face image 
compression are shown in Section [5] and the application of this representation is 
illustrated for MNIST image classifications in Section |6l 

2 Geometric Significance Set 

Sparse signal representations are obtained by decomposing signals over bases or 
frames {^p}pey which take advantage of the signal regularity to produce many 
zero coefficients. A sparse representation is obtained by keeping the significant 
coefficients above a threshold T, 

v = {p&y ■■ \{fAp)\>T} 



The original signal can be reconstructed with a dual family / = J2pey '^p) "^P' 
and the resulting sparse approximation is fy — J2p£y if^ 4>p) 4>p- 

Wavelet transforms compute signal inner products with several mother wavelets 
i/)'^ having a specific direction tuning, and which are dilated by 2^ and translated 
by 2^n: (f)p = tpj^n- Separable wavelet bases are obtained with 3 mother wavelets 
[5], in which case the total number \y\ of wavelets is equal to the image size. 

Let \y\ be the cardinal of the set y. In absence of prior information on y, the 
number of bits needed to code y in y is i?o = log2 . 

One can also verify 5] that the number of bits required to encode the values 
of coefficients in y is proportional to |y| and is smaller than Rq so that the coding 
budget is indeed dominated by Rq which carries most of the image information. 

3 Co-occurrence Groups 

In a supervised classification problem, a geometric model defines a prior model 
of the probability distribution q{y). There is a huge number 2l^l of subsets y 
in y. Estimating the probability q(jj) from a limited training set thus requires 
using a simplified prior model. 

A signal class is represented by a random vector whose realizations are within 
the class and whose significance sets y are included in y. A mixture model is 
introduced with co-occurrence groups 0{k) of constant size s, which define a 
partition of the overall index set y 

y = Ufe6'(fc) with |6i(fc)| = s and 9{k) n 9{k') = if k ^ k' . 

Co-occurrence groups 9{k) are optimized by enforcing that all coefficients 
have a similar behavior in a group and hence that y n 6{k) is either almost 
empty or almost equal to 9{k) with a high probability. The mixture model 
assumes that the distributions of the components yn9{k) are independent. The 
distribution q{y D 9{k)) is assumed to be uniform among all subsets of 9{k) of 
cardinal z{k) = |y H 9{k)\. Let qk{z{k)) be its distribution, 

q{y\e) = X{q{y^e{k)) = X{qk{z{k)) 

k k 

This co-occurrence model is identified with a maximum log-likelihood approach 
which computes 

argmax^(^-log2 [^J^q^-^ + logs ^(^(fc))) • 

4 Group Estimation with Bernoulli Mixtures 

Given a training sequence of images {f{\i<L that belong to a class, we opti- 
mize the group co-occurrence by approximating the maximum likelihood with a 
Bernoulli mixture. 




Let yi be the significant set of The log hkelihood is calculated with 

'^(j''^)=EE(-log2(^^Jfc))+l°g2ft(^;(fc))) with zlik) ^ \yineik)\ . (1) 

The maximization of this expression is obtained using the Stirling formula which 
approximates the first term by the entropy of a Bernoulli distribution. Let 
us write qk.i{0) = zi{k)/s and (j'fc,i(l) = 1 — zi{k)/s, the Bernoulli probability 
distribution associated to zi{k)/s. Let us specify the groups 9{k) by the inverse 
variables k{p) such that p G 9{k{p)). It results that 

E-l^g^G^fc)) - ^Kfc)log.(^)+(.-..(fc))log,(l-^ 

The distribution is generally unknown and must therefore be estimated. 
The estimation is regularized by approximating this distribution with a piece- 
wise constant distribution qk over a fixed number of quantization bins, that is 
small relatively to the number of realizations L. The likelihood ([ij is thus ap- 
proximated by a likelihood over the Bernoulli mixture, which is optimized over 
all parameters: 

arg min -^{^logqk(p),i{h,{p)) + ^\og2qk{zi{k))) . (2) 
; pey k 

The following algorithm, minimizes ^ by updating separately the Bernoulli 
parameters zi{k), the distribution q^ and the grouping variables k{p). 

The minimization algorithm begins with a random initialization of groups 
9{k) of same size s. The empirical histograms qu are initialized to uniform 
distributions. The algorithm iterates the following steps: 

• Step 1: Given {0(fc)}fc and {qk}k compute which minimizes ([2]) 
by minimizing 

- log2 qk{zi{k)) - zi{k) log, ^ - (. - z,(fc))log2(l - ^)) . (3) 

s s 

• Step 2: Update {qk}k to minimize ([2|) as the normalized histogram of the 
updated parameters {zi{k)}i over a predefined number of bins. 

• Step 3: Update the group indexes {k{p)}p to minimize ([2]) by minimizing 

-^\ogqk(p),i{lyM) , (4) 

for groups of constant size \0{k)\ — s. 

This algorithm is guaranteed to converge to a local maxima because each step 
further increases the log-likelihood. In fact, it is the equivalent of the if-means 
algorithm adapted to the mixture model considered here. 



5 Face Compression 



To illustrate the efficiency of this grouping strategy, it is first applied to the 
compression of face images that have been approximately registered. A database 
of 170 face images were used for training and a different set of 30 face images 
were used for testing. Figure [T] shows the optimal co-occurrence groups obtained 
over wavelet coefficients by applying the maximum log-likelihood algorithm on 
the training set. The encoding cost of the significance map using the optimized 
model is equal to minus the log-likelihood of this model. Figure [2] shows the 
evolution of the average bit budget needed to encode the significance maps with 
the Bernoulli mixture over optimized co-occurrence groups, depending upon the 
groups size s. The optimal group size which maximizes the log- likelihood and 
hence minimizes the encoding cost over all group sizes is s = 16. 




(c) 

Fig. 1: (a): Images of wavelet coefficients |(/,V'jn)l for three directions d = 1,2,3 
at a scale 2^ — 1? (b): thresholded coefficients, defining the significance maps 
yi. (c): grouping obtained with optimal group size s = 16. The stable geometric 
features are clearly visible. 




Fig. 2: Solid: bit rate using fixed square groups of size s as a function of log2 s. 
Dashed: bit rate (equal to minus the log likelihood, in bits per pixel) using the 
optimal groups of size s as a function of log2 s. 
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Fig. 3: Digit example. From left to right: original digit taken from MNIST 
database, random digit, significance maps, and grouping obtained by using the 
described algorithm. 



When s is equal to the image size, there is a single group and the encoding 
is thus equivalent to a standard image coding using no prior information on the 
class. The bit rate is also compared with a Bernoulli mixture computed with a 
partition into square groups 0(fc), as a function of s. Figure shows that the 
optimized co-occurrence grouping improves the bit rate by 20 % relatively to 
the case where there is a single group, and also with respect to the fixed square 
groups, which means that the optimal grouping provides a geometric information 
which is stable across the image class. The optimal group size s = 16 also gives 
an estimation of the image deformations that are due to variations of scaling 
and eye positions and to intrinsic variations of faces in the database. 

6 Random MNIST Digit Classification 

This section shows the classification ability of our geometric representation de- 
spite the presence of strong variability in the images. The test is performed 
using the standard MNIST database of digits. This database is relatively simple 
and without any modification of the image representation an SVM classifier can 
reach 1.4% of error with a training set of 60,000 images. This section shows that 
our geometric co-occurence model can learn with much less training elements 
and for more complex images. 

To take into account texture variation phenomena, which are a central diffi- 
culty for geometric models, a white noise texture is introduced. A digit image 
/[n] is transformed into a random digit /[n] = (/[n] -I- C)Vt^[n] where W\n\ is 
a normalized Gaussian white noise. The significance maps of these digits are 
simply obtained with a thresholding as shown in Figured It yields a binary im- 
age with a low density binary texture on the digit background and high density 
texture on the digit support. Visually, the digit is still perfectly recognizable 
despite the texture variability. With 4000 training images an SVM with a poly- 
nomial kernel yields a very low recognition rate of 21% on a different set of 
10000 test images. 

Figure [3] shows the optimal co-occurrence groups of size 14 computed with 
the minimization algorithm of Section U Despite the geometric variability, the 
algorithm is able to extract co-occurrence groups that do correspond to the 
digit structures and their deformations. To each digit < c? < 9, corresponds 
an optimized co-occurrence grouping 0^- Let /C(y, Qd) be the likelihood of the 



significance map y of / with, tlie grouping model 6d- An SVM classifier is trained 
on the feature vector {jC{y, Od{k))}f)<d<9,o<k<K , of dimension 10-56 with groups 
of size 14. With 4000 training images this classifier yields a recognition rate 
of 9% on a different set of 10000 test images. A simple maximum likelihood 
classifier (MAP) associates to each test image / the digit class 

d = arg max £(/, 6^) ■ 

0<d<9 

With 4000 training examples, this simple classifier yields a recognition rates of 
18% for random digits, which is already better than the SVM applied on the 
original pixels. 

7 Conclusion 

This paper introduces a new approach to define the geometry of a class of im- 
ages computed over a sparse representation, using co-occurrence groups. These 
co-occurrence groups are computed with a maximum log likelihood estimation 
calculated over optimized Bernoulli mixture model. An algorithm is introduced 
to optimize the group computation. The application to face image compres- 
sion shows the efficiency of this encoding approach, and the ability to compute 
co-occurrence groups that provide stable information across the class. A classi- 
fication test is performed over textured digits, which shows that the algorithm 
can take into account texture geometry and provide much better classification 
rates than a standard pixel based image representation. 
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